feat: add jemalloc heap profiling infrastructure#449
feat: add jemalloc heap profiling infrastructure#449eudelins-zama wants to merge 11 commits intomainfrom
Conversation
Usage# Build and start env
make build-compose-heap-profiling
make start-compose-heap-profiling
# Keygen
cargo run --bin kms-core-client -- -f core-client/config/client_local_threshold.toml -l -a insecure-key-gen
# copy key id to be used below:
# First decryption burst to have representative memory usage after some use
cargo run --bin kms-core-client -- -f core-client/config/client_local_threshold.toml -l -a public-decrypt from-args --to-encrypt 0xC2 --data-type euint64 -b 1 -n 2000 --key-id $KEY_ID --inter-request-delay-ms 500 -p 40
# First memory dump
make dump-heap-profiles
# Second decryption burst
cargo run --bin kms-core-client -- -f core-client/config/client_local_threshold.toml -l -a public-decrypt from-args --to-encrypt 0xC2 --data-type euint64 -b 1 -n 2000 --key-id $KEY_ID --inter-request-delay-ms 500 -p 40
# Second memory dump
make dump-heap-profiles
# Compare memory usage for core-1 before/after the second bump
./profiling/analyze-heap.sh ./profiling/heap-dumps/kms-server ./profiling/heap-dumps/core-1/
# Analysis of the `./profiling/heap-analysis/diff-leaks.txt` file -> Claude is pretty good at this |
Consolidated Tests Results 2026-03-08 - 20:15:43Test ResultsDetails
test-reporter: Run #749
🎉 All tests passed!TestsView All Tests
🍂 No flaky tests in this run. Github Test Reporter by CTRF 💚 🔄 This comment has been updated |
Vulnerability Scan ResultsDetails |
profiling/analyze-heap.sh
Outdated
| fi | ||
| } | ||
|
|
||
| # sed -i: GNU sed uses -i '', BSD (macOS) sed requires -i '' |
There was a problem hiding this comment.
This seems to imply that they are the same?
There was a problem hiding this comment.
Fixed in 44b0f8a (they are not the same apparently, but doc was unclear indeed)
|
I've been trying this PR on mac OS but I can't figure out how to run pprof. The script But this is not super surprising since I've always had problems with profilers on mac OS. If anyone has ideas please let me know, otherwise I can try compiling gperftools from scratch and see if it gives me a |
|
Ok I installed pprof using golang https://github.com/google/pprof but then I got the error It seems not all bash implementation supports negative indexing |
|
@kc1212 |
|
Btw, feel free to push an update on macOS setup documentation if you manage to troubleshoot some issues on your side! |
well, the script is asking for |
|
Docs says: But indeed the script seems to accept |
pushed the changes, seems to be all ok now! |
|
Small note that you configure jemalloc to auto-dump to |
Description of changes
Add
jemallocheap profiling infrastructure to detect memory leaks in KMS core nodes.Bumped
trivy-actionto avoid this CI failure.What's included
Feature-gated jemalloc integration (
heap-profilingCargo feature):tikv-jemallocatorwhen the feature is enabledSIGUSR1signal handler that triggers on-demand heap dumps to/tmp/kms-heap/kms_jemalloc_allocated,kms_jemalloc_resident) via ajemalloc-statsfeature on theobservabilitycrate, enabling leak-type diagnosis (application leak vs allocator fragmentation vs non-jemalloc growth)Dedicated Cargo profile (
heap-profiling):releasewithdebug=1(line tables only) andstrip=nonesojeprofcan resolve addresses tofunction:linewithout the full DWARF overheadDocker Compose overlay (
profiling/docker-compose-heap-profiling.yml):-C force-frame-pointers=yesfor reliable backtracesMALLOC_CONFwithprof:true,lg_prof_sample:12,prof_gdump:true, andprof_final:trueMakefile targets:
build-compose-heap-profiling/start-compose-heap-profiling/stop-compose-heap-profilingdump-heap-profiles: sendsSIGUSR1to all 4 cores, copies.heapfiles, the binary, and/proc/PID/mapslocallyAnalysis script (
profiling/analyze-heap.sh):MAPPED_LIBRARIESpaths and injectingmaps.txt. Not tested on macOS though.top-leaks.txt,latest.svg,diff-leaks.txt(allocation sites that grew), anddiff.svg(diff flamegraph)addr2lineresolution whenjeprofcan't resolve symbolsDockerfile changes:
CARGO_EXTRA_FEATURESandRUSTFLAGSbuild args so the profiling stack can inject feature flags and frame pointersid=cargo-target-${LTO_RELEASE}) to avoid cache collisions between release and heap-profiling buildsConfig fix (separate commit):
resharefield to all compose config filesIssue ticket number and link
Closes https://github.com/zama-ai/kms-internal/issues/2927
PR Checklist
I attest that all checked items are satisfied. Any deviation is clearly justified above.
chore: ...).TODO(#issue).unwrap/expect/paniconly in tests or for invariant bugs (documented if present).heap-profilingprofiledevopslabel + infra notified + infra-team reviewer assigned.!and affected teams notified.Zeroize+ZeroizeOnDropimplemented.unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.heap-profilingprofile (not used forreleaseprofile`profiling/analyze-heap.sh, I over-viewed it, seems good and it's working to generate meaningful dump comparison.Dependency Update Questionnaire (only if deps changed or added)
Answer in the
Cargo.tomlnext to the dependency (or here if updating):More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md